Multi-mode guard for voice commands

ABSTRACT

Embodiments may be implemented by a computing device, such as a head-mountable display, in order to use a single guard phrase to enable different voice commands in different interface modes. An example device includes an audio sensor and a computing system configured to analyze audio data captured by the audio sensor to detect speech that includes a predefined guard phrase, and to operate in a plurality of different interface modes comprising at least a first and a second interface mode. During operation in the first interface mode, the computing system may initially disable one or more first-mode speech commands, and respond to detection of the guard phrase by enabling the one or more first-mode speech commands. During operation in the second interface mode, the computing system may initially disable a second-mode speech command, and to respond to the guard phrase by enabling the second-mode speech command.

BACKGROUND

Unless otherwise indicated herein, the materials described in thissection are not prior art to the claims in this application and are notadmitted to be prior art by inclusion in this section.

Computing devices such as personal computers, laptop computers, tabletcomputers, cellular phones, and countless types of Internet-capabledevices are increasingly prevalent in numerous aspects of modern life.Over time, the manner in which these devices are providing informationto users is becoming more intelligent, more efficient, more intuitive,and/or less obtrusive.

The trend toward miniaturization of computing hardware, peripherals, aswell as of sensors, detectors, and image and audio processors, amongother technologies, has helped open up a field sometimes referred to as“wearable computing.” In the area of image and visual processing andproduction, in particular, it has become possible to consider wearabledisplays that place a graphic display close enough to a wearer's (oruser's) eye(s) such that the displayed image appears as a normal-sizedimage, such as might be displayed on a traditional image display device.The relevant technology may be referred to as “near-eye displays.”

Wearable computing devices with near-eye displays may also be referredto as “head-mountable displays” (HMDs), “head-mounted displays,”“head-mounted devices,” or “head-mountable devices.” A head-mountabledisplay places a graphic display or displays close to one or both eyesof a wearer. To generate the images on a display, a computer processingsystem may be used. Such displays may occupy a wearer's entire field ofview, or only occupy part of wearer's field of view. Further,head-mounted displays may vary in size, taking a smaller form such as aglasses-style display or a larger form such as a helmet, for example.

Emerging and anticipated uses of wearable displays include applicationsin which users interact in real time with an augmented or virtualreality. Such applications can be mission-critical or safety-critical,such as in a public safety or aviation setting. The applications canalso be recreational, such as interactive gaming. Many otherapplications are also possible.

SUMMARY

An example head-mountable device (HMD) may be operable to receive andinterpret voice commands. In this and other contexts, it may bedesirable to disable certain voice commands until a guard phrase isdetected, in order to reduce the occurrence of false-positive detectionsof voice commands. It may also be desirable for the HMD to supportspeech commands in some places within a UI, but not in others. However,it can be challenging to make such a UI simple for users to understand,such that the user knows when certain speech commands are and are notavailable. Accordingly, an example HMD may be configured to respond tothe same guard phrase in different ways, depending upon the state theUI. In particular, an HMD may define a single, multi-modal, guardphrase, and may also define multiple interface modes that correspond todifferent states of the HMD's UI. The same guard phrase may therefore beused to enable a different speech command or commands in differentinterface modes.

In one aspect, a device may include at least one audio sensor and acomputing system configured to: (a) analyze audio data captured by theat least one audio sensor in order to detect speech that includes apredefined guard phrase and (b) operate in a plurality of differentinterface modes comprising at least a first and a second interface mode.During operation in the first interface mode, the computing system isconfigured to initially disable one or more first-mode speech commands,and to respond to detection of the guard phrase by enabling the one ormore first-mode speech commands. During operation in the secondinterface mode, the computing system is configured to initially disableone or more second-mode speech commands, and to respond to detecting theguard phrase by enabling the one or more second-mode speech commands.

In another aspect, a computer-implemented method may involve: (a) acomputing device operating in a first interface mode, wherein, duringoperation in the first interface mode, the computing device initiallydisables one or more first-mode speech commands, and responds todetection of a guard phrase by enabling the one or more first-modespeech commands; and (b) a computing device operating in a secondinterface mode, wherein, during operation in the second interface mode,the computing device initially disables one or more second-mode speechcommands, and responds to detection of a guard phrase by enabling theone or more second-mode speech commands.

In a further aspect, a non-transitory computer readable medium may havestored therein instructions that are executable by a computing device tocause the computing device to perform functions comprising: (a)operating in a first interface mode, wherein the functions for operatingin the first interface mode comprise initially disabling one or morefirst-mode speech commands, and responding to detection of a guardphrase by enabling the one or more first-mode speech commands; and (b)operating in a second interface mode, wherein the functions foroperating in the second interface mode comprise initially disabling oneor more second-mode speech commands, and responding to detection of aguard phrase by enabling the one or more second-mode speech commands.

In yet a further aspect, a system may include: (a) a means for causing acomputing device to operate in a first interface mode, wherein, duringoperation in the first interface mode, the computing device initiallydisables one or more first-mode speech commands and responds todetection of a guard phrase by enabling the one or more first-modespeech commands; and (b) a means for causing a computing device tooperate in a second interface mode, wherein, during operation in thesecond interface mode, the computing device initially disables one ormore second-mode speech commands and responds to detection of a guardphrase by enabling the one or more second-mode speech commands.

These as well as other aspects, advantages, and alternatives will becomeapparent to those of ordinary skill in the art by reading the followingdetailed description, with reference where appropriate to theaccompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows screen views of a user-interface during a transitionbetween two interface modes, according to an example embodiment.

FIG. 2A illustrates a wearable computing system according to an exampleembodiment.

FIG. 2B illustrates an alternate view of the wearable computing deviceillustrated in FIG. 2A.

FIG. 2C illustrates another wearable computing system according to anexample embodiment.

FIG. 2D illustrates another wearable computing system according to anexample embodiment.

FIGS. 2E to 2G are simplified illustrations of the wearable computingsystem shown in FIG. 1D, being worn by a wearer.

FIG. 3A is a simplified block diagram of a computing device according toan example embodiment.

FIG. 3B shows a projection of an image by a head-mountable device,according to an example embodiment.

FIGS. 4A and 4B are flow charts illustrating methods, according toexample embodiments.

FIGS. 5A to 5C illustrate applications of a multi-mode guard phrase,according to example embodiments.

DETAILED DESCRIPTION

Example methods and systems are described herein. It should beunderstood that the words “example” and “exemplary” are used herein tomean “serving as an example, instance, or illustration.” Any embodimentor feature described herein as being an “example” or “exemplary” is notnecessarily to be construed as preferred or advantageous over otherembodiments or features. In the following detailed description,reference is made to the accompanying figures, which form a partthereof. In the figures, similar symbols typically identify similarcomponents, unless context dictates otherwise. Other embodiments may beutilized, and other changes may be made, without departing from thespirit or scope of the subject matter presented herein.

The example embodiments described herein are not meant to be limiting.It will be readily understood that the aspects of the presentdisclosure, as generally described herein, and illustrated in thefigures, can be arranged, substituted, combined, separated, and designedin a wide variety of different configurations, all of which areexplicitly contemplated herein.

I. Overview

A head-mountable device (HMD) may be configured to provide a voiceinterface, and as such, may be configured to listen for commands thatare spoken by the wearer. Herein spoken commands may be referred tointerchangeably as either “voice commands” or “speech commands.”

When an HMD enables speech commands, the HMD may continuously listen forspeech, so that a user can readily use the speech commands to interactwith the HMD. In such case, it may be desirable to implement a “guardphrase,” which the user must recite before the speech commands areenabled. By disabling voice commands until such a guard phrase isdetected, an HMD may be able to reduce the occurrence offalse-positives. In other words, the HMD may be able to reduce instanceswhere the HMD incorrectly interprets speech as including a particularspeech command, and thus takes an undesired action. However,implementing such a guard phrase in a streamlined manner may bedifficult, as users can perceive the need to speak a guard word before aspeech command as an extra step that complicates a UI.

Further, it may also be desirable for an HMD to support speech commandsin some places within a user interface (UI), but not in others. However,it can be challenging to make such a UI simple for users to understand(e.g., so that the user knows when speech commands are and are notavailable). This can be further complicated by the fact that differentspeech commands may be needed in different places within the UI.

According to an example embodiment, an HMD may be configured to respondto the same guard phrase in different ways, depending upon the state theUI. In particular, an HMD may define multiple interface modes thatcorrespond to different states of the HMD's UI. Each interface mode mayhave a different “hotword” model that, when loaded, listens for one ormore speech commands that are specific to the particular interface mode.

In a further aspect, the HMD may define a single guard phrase to be usedin multiple interface modes where speech commands are available. Thisguard phrase may be to be a multi-modal guard phrase, since it is usedin the same way across multiple interface modes. Notably however, theactions taken by HMD in response to detecting the guard phrase arenon-modal, as the HMD does not change the interface mode when the guardphrase is detected. Rather, the guard phrase may be used to enable adifferent speech command or commands in different interface modes (e.g.,by activating the different hotword processes specified by the differentinterface modes).

Configured as such, an example HMD may switch between differentinterface modes in order to operate in whichever interface modecorresponds to the current state of the UI (typically the interface modethat provides for speech command(s) that are useful in the current UIstate). Each time the HMD switches to a different interface mode, theHMD may disable voice interaction (e.g., by unloading the previousmode's hotword process and/or refraining from loading the new mode'shotword process), and require that the user say the guard phrase inorder to enable the new mode's speech commands. Then, when the HMDdetects the guard phrase, the HMD enables the speech command(s) that arespecific to the particular interface mode (e.g., by activating thehotword process for the interface mode).

Additionally, in interface modes where speech commands can be enabled,the HMD may display a visual cue of the guard phrase, which can helpalert a user that voice interaction can be enabled. And, once the HMDdetects the guard phrase, the HMD may display a visual cue or cues thatindicate the particular speech command(s) that have been enabled. Bycombining such visual cues with a multi-modal guard phrase, an HMD mayallow for useful voice input in a manner that is more intuitive to theuser.

For example, FIG. 1 shows screen views of a UI during a transitionbetween two interface modes in which a multi-mode guard phrase isimplemented, according to an example embodiment.

More specifically, an HMD may operate in a first interface mode 101,where one or more first-mode speech commands can be enabled by speakinga predefined guard phrase. When the HMD switches to the first interfacemode 101 from another interface mode, the HMD may initially disable thefirst-mode speech commands and display a visual cue for the guard phrasein its display, as shown in screen view 102. If the HMD detects theguard phrase while in the first interface mode, the HMD may enable theone or more first-mode speech commands, and display visual cues thatindicate the enabled first-mode speech commands, as shown in screen view104.

To provide a specific example, the first interface mode 101 may providean interface for a home screen, which provides a launching point for auser to access a number of frequently-used features. Accordingly, whenthe user speaks a command to access a different feature, such as acamera or phone feature, the HMD may switch to the interface mode thatprovides an interface for the different feature.

More generally, when the HMD switches to a different aspect of its UIfor which one or more second-mode speech commands are supported, the HMDmay switch to a second interface mode 103. When the HMD switches to thesecond interface mode 103, the HMD may disable any speech commands thatwere enabled, and listen only for the guard phrase (e.g., by loading aguard-phrase hotword process). Further, the HMD may require the user toagain speak the guard phrase before enabling the one or more second-modespeech commands.

To provide a hint to the user that the guard word will enable voicecommands, the HMD may again display the visual cue for the guard phrase,as shown in screen 106. And, if the HMD detects the guard phrase whilein the second interface mode 103, the HMD may responsively enable theone or more second-mode speech commands (e.g., by loading the hotwordprocess for the second interface mode). When the second-mode speechcommands are enabled, the HMD may display visual cues that indicate theenabled second-mode speech commands, as shown in screen view 108.

Many implementations of a multi-mode guard phrase are possible. Oneimplementation involves an HMD a home screen, which serves as a launchpoint for various different features (some or all of which may providefor voice commands), one of which may be a video camera. Thus, from thehome screen, the user may say the guard phrase followed by anotherspeech command in order to launch a camera application. Further, in someembodiments, the HMD may automatically start recording when the userlaunches the camera application via the home screen. During videorecording, the guard phrase may be displayed to indicate that a speechcommand can be enabled by saying the guard word. In particular, the usermay say the guard phrase followed by “stop recording” (e.g., the speechcommand that can be enabled in the video-recording mode), in order tostop recording video. Other implementations are also possible.

In a further aspect, a second protective feature against falsepositives, in addition to the multi-mode guard phrase, may be utilizedin some or all interface modes. In particular, time-out process may beimplemented in order to disable the enabled speech commands if at leastone of the enabled speech commands is not detected within apredetermined period of time after detection of the guard phrase. Forexample, in the implementation described above, a time-out process maybe implemented when the guard phrase is detected while the HMD isoperating in the video-recording mode. As such, when the HMD detects theguard phrase, the HMD may start a timer. Then, if the HMD does notdetect the “stop recording” speech command within five seconds, forexample, then the HMD may disable the “stop recording” speech command,and require the guard phrase in order to re-enable the “stop recording”speech command.

II. Example Wearable Computing Devices

Systems and devices in which example embodiments may be implemented willnow be described in greater detail. In general, an example system may beimplemented in or may take the form of a wearable computer (alsoreferred to as a wearable computing device). In an example embodiment, awearable computer takes the form of or includes a head-mountable device(HMD).

An example system may also be implemented in or take the form of otherdevices that support speech commands, such as a mobile phone, tabletcomputer, laptop computer, or desktop computer, among otherpossibilities. Further, an example system may take the form ofnon-transitory computer readable medium, which has program instructionsstored thereon that are executable by at a processor to provide thefunctionality described herein. An example system may also take the formof a device such as a wearable computer or mobile phone, or a subsystemof such a device, which includes such a non-transitory computer readablemedium having such program instructions stored thereon.

An HMD may generally be any display device that is capable of being wornon the head and places a display in front of one or both eyes of thewearer. An HMD may take various forms such as a helmet or eyeglasses. Assuch, references to “eyeglasses” or a “glasses-style” HMD should beunderstood to refer to an HMD that has a glasses-like frame so that itcan be worn on the head. Further, example embodiments may be implementedby or in association with an HMD with a single display or with twodisplays, which may be referred to as a “monocular” HMD or a “binocular”HMD, respectively.

FIG. 2A illustrates a wearable computing system according to an exampleembodiment. In FIG. 2A, the wearable computing system takes the form ofa head-mountable device (HMD) 202 (which may also be referred to as ahead-mounted display). It should be understood, however, that examplesystems and devices may take the form of or be implemented within or inassociation with other types of devices, without departing from thescope of the invention. As illustrated in FIG. 2A, the HMD 202 includesframe elements including lens-frames 204, 206 and a center frame support208, lens elements 210, 212, and extending side-arms 214, 216. Thecenter frame support 208 and the extending side-arms 214, 216 areconfigured to secure the HMD 202 to a user's face via a user's nose andears, respectively.

Each of the frame elements 204, 206, and 208 and the extending side-arms214, 216 may be formed of a solid structure of plastic and/or metal, ormay be formed of a hollow structure of similar material so as to allowwiring and component interconnects to be internally routed through theHMD 202. Other materials may be possible as well.

One or more of each of the lens elements 210, 212 may be formed of anymaterial that can suitably display a projected image or graphic. Each ofthe lens elements 210, 212 may also be sufficiently transparent to allowa user to see through the lens element. Combining these two features ofthe lens elements may facilitate an augmented reality or heads-updisplay where the projected image or graphic is superimposed over areal-world view as perceived by the user through the lens elements.

The extending side-arms 214, 216 may each be projections that extendaway from the lens-frames 204, 206, respectively, and may be positionedbehind a user's ears to secure the HMD 202 to the user. The extendingside-arms 214, 216 may further secure the HMD 202 to the user byextending around a rear portion of the user's head. Additionally oralternatively, for example, the HMD 202 may connect to or be affixedwithin a head-mounted helmet structure. Other configurations for an HMDare also possible.

The HMD 202 may also include an on-board computing system 218, an imagecapture device 220, a sensor 222, and a finger-operable touch pad 224.The on-board computing system 218 is shown to be positioned on theextending side-arm 214 of the HMD 202; however, the on-board computingsystem 218 may be provided on other parts of the HMD 202 or may bepositioned remote from the HMD 202 (e.g., the on-board computing system218 could be wire- or wirelessly-connected to the HMD 202). The on-boardcomputing system 218 may include a processor and memory, for example.The on-board computing system 218 may be configured to receive andanalyze data from the image capture device 220 and the finger-operabletouch pad 224 (and possibly from other sensory devices, user interfaces,or both) and generate images for output by the lens elements 210 and212.

The image capture device 220 may be, for example, a camera that isconfigured to capture still images and/or to capture video. In theillustrated configuration, image capture device 220 is positioned on theextending side-arm 214 of the HMD 202; however, the image capture device220 may be provided on other parts of the HMD 202. The image capturedevice 220 may be configured to capture images at various resolutions orat different frame rates. Many image capture devices with a smallform-factor, such as the cameras used in mobile phones or webcams, forexample, may be incorporated into an example of the HMD 202.

Further, although FIG. 2A illustrates one image capture device 220, moreimage capture device may be used, and each may be configured to capturethe same view, or to capture different views. For example, the imagecapture device 220 may be forward facing to capture at least a portionof the real-world view perceived by the user. This forward facing imagecaptured by the image capture device 220 may then be used to generate anaugmented reality where computer generated images appear to interactwith or overlay the real-world view perceived by the user.

The sensor 222 is shown on the extending side-arm 216 of the HMD 202;however, the sensor 222 may be positioned on other parts of the HMD 202.For illustrative purposes, only one sensor 222 is shown. However, in anexample embodiment, the HMD 202 may include multiple sensors. Forexample, an HMD 202 may include sensors 202 such as one or moregyroscopes, one or more accelerometers, one or more magnetometers, oneor more light sensors, one or more infrared sensors, and/or one or moremicrophones. Other sensing devices may be included in addition or in thealternative to the sensors that are specifically identified herein.

The finger-operable touch pad 224 is shown on the extending side-arm 214of the HMD 202. However, the finger-operable touch pad 224 may bepositioned on other parts of the HMD 202. Also, more than onefinger-operable touch pad may be present on the HMD 202. Thefinger-operable touch pad 224 may be used by a user to input commands.The finger-operable touch pad 224 may sense at least one of a pressure,position and/or a movement of one or more fingers via capacitivesensing, resistance sensing, or a surface acoustic wave process, amongother possibilities. The finger-operable touch pad 224 may be capable ofsensing movement of one or more fingers simultaneously, in addition tosensing movement in a direction parallel or planar to the pad surface,in a direction normal to the pad surface, or both, and may also becapable of sensing a level of pressure applied to the touch pad surface.In some embodiments, the finger-operable touch pad 224 may be formed ofone or more translucent or transparent insulating layers and one or moretranslucent or transparent conducting layers. Edges of thefinger-operable touch pad 224 may be formed to have a raised, indented,or roughened surface, so as to provide tactile feedback to a user whenthe user's finger reaches the edge, or other area, of thefinger-operable touch pad 224. If more than one finger-operable touchpad is present, each finger-operable touch pad may be operatedindependently, and may provide a different function.

In a further aspect, HMD 202 may be configured to receive user input invarious ways, in addition or in the alternative to user input receivedvia finger-operable touch pad 224. For example, on-board computingsystem 218 may implement a speech-to-text process and utilize a syntaxthat maps certain spoken commands to certain actions. In addition, HMD202 may include one or more microphones via which a wearer's speech maybe captured. Configured as such, HMD 202 may be operable to detectspoken commands and carry out various computing functions thatcorrespond to the spoken commands.

As another example, HMD 202 may interpret certain head-movements as userinput. For example, when HMD 202 is worn, HMD 202 may use one or moregyroscopes and/or one or more accelerometers to detect head movement.The HMD 202 may then interpret certain head-movements as being userinput, such as nodding, or looking up, down, left, or right. An HMD 202could also pan or scroll through graphics in a display according tomovement. Other types of actions may also be mapped to head movement.

As yet another example, HMD 202 may interpret certain gestures (e.g., bya wearer's hand or hands) as user input. For example, HMD 202 maycapture hand movements by analyzing image data from image capture device220, and initiate actions that are defined as corresponding to certainhand movements.

As a further example, HMD 202 may interpret eye movement as user input.In particular, HMD 202 may include one or more inward-facing imagecapture devices and/or one or more other inward-facing sensors (notshown) that may be used to track eye movements and/or determine thedirection of a wearer's gaze. As such, certain eye movements may bemapped to certain actions. For example, certain actions may be definedas corresponding to movement of the eye in a certain direction, a blink,and/or a wink, among other possibilities.

HMD 202 also includes a speaker 225 for generating audio output. In oneexample, the speaker could be in the form of a bone conduction speaker,also referred to as a bone conduction transducer (BCT). Speaker 225 maybe, for example, a vibration transducer or an electroacoustic transducerthat produces sound in response to an electrical audio signal input. Theframe of HMD 202 may be designed such that when a user wears HMD 202,the speaker 225 contacts the wearer. Alternatively, speaker 225 may beembedded within the frame of HMD 202 and positioned such that, when theHMD 202 is worn, speaker 225 vibrates a portion of the frame thatcontacts the wearer. In either case, HMD 202 may be configured to sendan audio signal to speaker 225, so that vibration of the speaker may bedirectly or indirectly transferred to the bone structure of the wearer.When the vibrations travel through the bone structure to the bones inthe middle ear of the wearer, the wearer can interpret the vibrationsprovided by BCT 225 as sounds.

Various types of bone-conduction transducers (BCTs) may be implemented,depending upon the particular implementation. Generally, any componentthat is arranged to vibrate the HMD 202 may be incorporated as avibration transducer. Yet further it should be understood that an HMD202 may include a single speaker 225 or multiple speakers. In addition,the location(s) of speaker(s) on the HMD may vary, depending upon theimplementation. For example, a speaker may be located proximate to awearer's temple (as shown), behind the wearer's ear, proximate to thewearer's nose, and/or at any other location where the speaker 225 canvibrate the wearer's bone structure.

FIG. 2B illustrates an alternate view of the wearable computing deviceillustrated in FIG. 2A. As shown in FIG. 2B, the lens elements 210, 212may act as display elements. The HMD 202 may include a first projector228 coupled to an inside surface of the extending side-arm 216 andconfigured to project a display 230 onto an inside surface of the lenselement 212. Additionally or alternatively, a second projector 232 maybe coupled to an inside surface of the extending side-arm 214 andconfigured to project a display 234 onto an inside surface of the lenselement 210.

The lens elements 210, 212 may act as a combiner in a light projectionsystem and may include a coating that reflects the light projected ontothem from the projectors 228, 232. In some embodiments, a reflectivecoating may not be used (e.g., when the projectors 228, 232 are scanninglaser devices).

In alternative embodiments, other types of display elements may also beused. For example, the lens elements 210, 212 themselves may include: atransparent or semi-transparent matrix display, such as anelectroluminescent display or a liquid crystal display, one or morewaveguides for delivering an image to the user's eyes, or other opticalelements capable of delivering an in focus near-to-eye image to theuser. A corresponding display driver may be disposed within the frameelements 204, 206 for driving such a matrix display. Alternatively oradditionally, a laser or LED source and scanning system could be used todraw a raster display directly onto the retina of one or more of theuser's eyes. Other possibilities exist as well.

FIG. 2C illustrates another wearable computing system according to anexample embodiment, which takes the form of an HMD 252. The HMD 252 mayinclude frame elements and side-arms such as those described withrespect to FIGS. 2A and 2B. The HMD 252 may additionally include anon-board computing system 254 and an image capture device 256, such asthose described with respect to FIGS. 2A and 2B. The image capturedevice 256 is shown mounted on a frame of the HMD 252. However, theimage capture device 256 may be mounted at other positions as well.

As shown in FIG. 2C, the HMD 252 may include a single display 258 whichmay be coupled to the device. The display 258 may be formed on one ofthe lens elements of the HMD 252, such as a lens element described withrespect to FIGS. 2A and 2B, and may be configured to overlaycomputer-generated graphics in the user's view of the physical world.The display 258 is shown to be provided in a center of a lens of the HMD252, however, the display 258 may be provided in other positions, suchas for example towards either the upper or lower portions of thewearer's field of view. The display 258 is controllable via thecomputing system 254 that is coupled to the display 258 via an opticalwaveguide 260.

FIG. 2D illustrates another wearable computing system according to anexample embodiment, which takes the form of a monocular HMD 272. The HMD272 may include side-arms 273, a center frame support 274, and a bridgeportion with nosepiece 275. In the example shown in FIG. 2D, the centerframe support 274 connects the side-arms 273. The HMD 272 does notinclude lens-frames containing lens elements. The HMD 272 mayadditionally include a component housing 276, which may include anon-board computing system (not shown), an image capture device 278, anda button 279 for operating the image capture device 278 (and/or usablefor other purposes). Component housing 276 may also include otherelectrical components and/or may be electrically connected to electricalcomponents at other locations within or on the HMD. HMD 272 alsoincludes a BCT 286.

The HMD 272 may include a single display 280, which may be coupled toone of the side-arms 273 via the component housing 276. In an exampleembodiment, the display 280 may be a see-through display, which is madeof glass and/or another transparent or translucent material, such thatthe wearer can see their environment through the display 280. Further,the component housing 276 may include the light sources (not shown) forthe display 280 and/or optical elements (not shown) to direct light fromthe light sources to the display 280. As such, display 280 may includeoptical features that direct light that is generated by such lightsources towards the wearer's eye, when HMD 272 is being worn.

In a further aspect, HMD 272 may include a sliding feature 284, whichmay be used to adjust the length of the side-arms 273. Thus, slidingfeature 284 may be used to adjust the fit of HMD 272. Further, an HMDmay include other features that allow a wearer to adjust the fit of theHMD, without departing from the scope of the invention.

FIGS. 2E to 2G are simplified illustrations of the HMD 272 shown in FIG.2D, being worn by a wearer 290. As shown in FIG. 2F, when HMD 272 isworn, BCT 286 is arranged such that when HMD 272 is worn, BCT 286 islocated behind the wearer's ear. As such, BCT 286 is not visible fromthe perspective shown in FIG. 2E.

In the illustrated example, the display 280 may be arranged such thatwhen HMD 272 is worn, display 280 is positioned in front of or proximateto a user's eye when the HMD 272 is worn by a user. For example, display280 may be positioned below the center frame support and above thecenter of the wearer's eye, as shown in FIG. 2E. Further, in theillustrated configuration, display 280 may be offset from the center ofthe wearer's eye (e.g., so that the center of display 280 is positionedto the right and above of the center of the wearer's eye, from thewearer's perspective).

Configured as shown in FIGS. 2E to 2G, display 280 may be located in theperiphery of the field of view of the wearer 290, when HMD 272 is worn.Thus, as shown by FIG. 2F, when the wearer 290 looks forward, the wearer290 may see the display 280 with their peripheral vision. As a result,display 280 may be outside the central portion of the wearer's field ofview when their eye is facing forward, as it commonly is for manyday-to-day activities. Such positioning can facilitate unobstructedeye-to-eye conversations with others, as well as generally providingunobstructed viewing and perception of the world within the centralportion of the wearer's field of view. Further, when the display 280 islocated as shown, the wearer 290 may view the display 280 by, e.g.,looking up with their eyes only (possibly without moving their head).This is illustrated as shown in FIG. 2G, where the wearer has movedtheir eyes to look up and align their line of sight with display 280. Awearer might also use the display by tilting their head down andaligning their eye with the display 280.

FIG. 3A is a simplified block diagram a computing device 310 accordingto an example embodiment. In an example embodiment, device 310communicates using a communication link 320 (e.g., a wired or wirelessconnection) to a remote device 330. The device 310 may be any type ofdevice that can receive data and display information corresponding to orassociated with the data. For example, the device 310 may take the formof or include a head-mountable display, such as the head-mounted devices202, 252, or 272 that are described with reference to FIGS. 2A to 2G.

The device 310 may include a processor 314 and a display 316. Thedisplay 316 may be, for example, an optical see-through display, anoptical see-around display, or a video see-through display. Theprocessor 314 may receive data from the remote device 330, and configurethe data for display on the display 316. The processor 314 may be anytype of processor, such as a micro-processor or a digital signalprocessor, for example.

The device 310 may further include on-board data storage, such as memory318 coupled to the processor 314. The memory 318 may store software thatcan be accessed and executed by the processor 314, for example.

The remote device 330 may be any type of computing device or transmitterincluding a laptop computer, a mobile telephone, head-mountable display,tablet computing device, etc., that is configured to transmit data tothe device 310. The remote device 330 and the device 310 may containhardware to enable the communication link 320, such as processors,transmitters, receivers, antennas, etc.

Further, remote device 330 may take the form of or be implemented in acomputing system that is in communication with and configured to performfunctions on behalf of client device, such as computing device 310. Sucha remote device 330 may receive data from another computing device 310(e.g., an HMD 202, 252, or 272 or a mobile phone), perform certainprocessing functions on behalf of the device 310, and then send theresulting data back to device 310. This functionality may be referred toas “cloud” computing.

In FIG. 3A, the communication link 320 is illustrated as a wirelessconnection; however, wired connections may also be used. For example,the communication link 320 may be a wired serial bus such as a universalserial bus or a parallel bus. A wired connection may be a proprietaryconnection as well. The communication link 320 may also be a wirelessconnection using, e.g., Bluetooth® radio technology, communicationprotocols described in IEEE 802.11 (including any IEEE 802.11revisions), Cellular technology (such as GSM, CDMA, UMTS, EV-DO, WiMAX,or LTE), or Zigbee® technology, among other possibilities. The remotedevice 330 may be accessible via the Internet and may include acomputing cluster associated with a particular web service (e.g.,social-networking, photo sharing, address book, etc.).

FIG. 3B shows an example projection of UI elements described herein viaan image 380 by an example head-mountable device (HMD) 352, according toan example embodiment. Other configurations of an HMD may be also beused to present the UI described herein via image 380. FIG. 3B showswearer 354 of HMD 352 looking at an eye of person 356. As such, wearer354′s gaze, or direction of viewing, is along gaze vector 360. Ahorizontal plane, such as horizontal gaze plane 364 can then be used todivide space into three portions: space above horizontal gaze plane 364,space in horizontal gaze plane 364, and space below horizontal gazeplane 364. In the context of projection plane 376, horizontal gaze plane360 appears as a line that divides projection plane into a subplaneabove the line of horizontal gaze plane 360, a subplane a subspace belowthe line of horizontal gaze plane 360, and the line where horizontalgaze plane 360 intersects projection plane 376. In FIG. 3B, horizontalgaze plane 364 is shown using dotted lines.

Additionally, a dividing plane, indicated using dividing line 374 can bedrawn to separate space into three other portions: space to the left ofthe dividing plane, space on the dividing plane, and space to right ofthe dividing plane. In the context of projection plane 376, the dividingplane intersects projection plane 376 at dividing line 374. Thus thedividing plane divides projection plane into: a subplane to the left ofdividing line 374, a subplane to the right of dividing line 374, anddividing line 374. In FIG. 3B, dividing line 374 is shown as a solidline.

Humans, such as wearer 354, when gazing in a gaze direction, may havelimits on what objects can be seen above and below the gaze direction.FIG. 3B shows the upper visual plane 370 as the uppermost plane thatwearer 354 can see while gazing along gaze vector 360, and shows lowervisual plane 372 as the lowermost plane that wearer 354 can see whilegazing along gaze vector 360. In FIG. 3B, upper visual plane 370 andlower visual plane 372 are shown using dashed lines.

The HMD can project an image for view by wearer 354 at some apparentdistance 362 along display line 382, which is shown as a dotted anddashed line in FIG. 3B. For example, apparent distance 362 can be 1meter, four feet, infinity, or some other distance. That is, HMD 352 cangenerate a display, such as image 380, which appears to be at theapparent distance 362 from the eye of wearer 354 and in projection plane376. In this example, image 380 is shown between horizontal gaze plane364 and upper visual plane 370; that is image 380 is projected abovegaze vector 360. In this example, image 380 is also projected to theright of dividing line 374. As image 380 is projected above and to theright of gaze vector 360, wearer 354 can look at person 356 withoutimage 380 obscuring their general view. In one example, the displayelement of the HMD 352 is translucent when not active (i.e. when image380 is not being displayed), and so the wearer 354 can perceive objectsin the real world along the vector of display line 382.

Other example locations for displaying image 380 can be used to permitwearer 354 to look along gaze vector 360 without obscuring the view ofobjects along the gaze vector. For example, in some embodiments, image380 can be projected above horizontal gaze plane 364 near and/or justabove upper visual plane 370 to keep image 380 from obscuring most ofwearer 354′s view. Then, when wearer 354 wants to view image 380, wearer354 can move their eyes such that their gaze is directly toward image380.

III. Illustrative Methods

FIG. 4A is a flow chart illustrating a method 400, according to anexample embodiment. Some functions of the method may be implementedwhile an HMD is operating in first interface mode 402, while otherfunctions may be implemented while an HMD is operating in secondinterface mode 404.

Referring again to FIG. 4A, method 400 involves a computing device, suchas an HMD or component thereof, operating in a first interface mode 402.Operation in the first interface mode 402 involves initially disablingone or more first-mode speech commands, as shown by block 406. Further,the HMD may listen for a guard phrase, as shown by block 408. If theguard phrase is detected, then the HMD responds by enabling the one ormore first-mode speech commands, as shown by block 410. However, as longthe guard phrase is not detected, the HMD the first-mode speech commandsremain disabled.

At a different point in time, method 400 involves operating in thesecond interface mode 404, instead of the first interface mode 402.Operation in the second interface mode 404 involves initially disablingone or more second-mode speech commands as shown by block 412. Further,in the second interface mode 404, the HMD may listen for the same guardphrase as the HMD listened for in the first interface mode 402, as shownby block 414. If the guard phrase is detected, then the HMD may respondby enabling the second-mode speech command, as shown by block 416.However, as long the guard phrase is not detected, the second-modespeech command remains disabled.

In the illustrated embodiment, the HMD can switch directly back andforth between the first interface mode 402 and the second interface mode404. It should be understood, however, that it is possible for the HMDto necessarily or optionally switch to one or more other interface modesin order to reach the second interface mode 404 from the first interfacemode 402, and vice versa. Further, the HMD may switch between the firstinterface mode 402 and one or more other interface modes, and/or mayswitch between the second interface mode 404 and one or more otherinterface modes.

Herein, an interface mode may specify a certain way of interpretinginput data received via user-input interfaces, such as microphone(s),touchpad(s) or touchscreen(s), sensor(s) arranged to detect gestures inthe air, sensors arranged to provide data indicative of head movement(e.g., accelerometer(s), gyroscope(s), and/or magnetometers), aneye-tracking or gaze-tracking system configured to detect eye gesturesand/or movement (e.g., winks, blinks, and/or directional movements ofthe eye), a keyboard, and/or a mouse, among other possibilities. Aparticular interface mode may also correspond to particular way ofreceiving input data and/or assisting the user in providing input datathat is appropriate for the mode, such as a graphical user-interface(GUI) that is designed to receive certain types of input data and/or tosuggest what input data and/or operations are possible in the interfacemode.

In an example embodiment, a given interface mode may specify certainvoice commands that are available while in the given interface mode. Avoice command that is available in a certain interface mode may or maynot be immediately usable when the HMD switches to the interface mode.In particular, when an HMD switches to a new interface mode, anavailable voice command for the interface mode may, by default, bedisabled. Accordingly, the user may be required to enable the voicecommand. In an example embodiment, at least the first and secondinterface modes each specify voice commands that can be enabled in therespective mode. However, there may be other interface modes that do notprovide for any voice commands. Alternatively, in some embodiments,there may be voice commands available in every interface mode of theHMD.

To implement an interface mode in which voice commands are available, anHMD may utilize “hotword” models. A hotword process may be program logicthat is executed to listen for certain voice or speech commands in anincoming audio stream. Accordingly, when the HMD begins operating in thefirst interface mode 402, the HMD may load a hotword process for theguard phrase in order to listen for a guard phrase. Then, when the HMDdetects the guard phrase (e.g., at block 408), the HMD may responsivelyload a hotword process or models for the one or more first-mode speechcommands (e.g., at block 410). Note that in some cases, there may be asingle hotword process for each speech command. In other cases, a singlehotword process may be loaded to listen for two or more speech commands.

In a further aspect, there may be a particular speech command or anothertype of user command that allows the user to switch from the portion ofthe UI associated with the first interface mode to the portion of the UIassociated with the second interface mode. For example, during operationin the first interface mode 402, the HMD may detect the guard phrasefollowed by one of the first-mode speech commands and responsivelyswitch to the second interface mode 404. In some implementations, thefirst interface mode 402 may correspond to a home-screen interface, andthe one or more first-mode speech commands may correspond to one or moreactions that can be initiated via the home-screen interface. Further,one of the one or more first-mode speech commands may be a speechcommand that can be spoken to start recording a video. As such, to causethe HMD to start recording a video, the user may say the guard phrasefollowed by the speech command that starts recording the video.

FIG. 4B is a flow chart illustrating another method 450, according to anexample embodiment. Method 450 is an embodiment of method 400 in whichhotword processes are used to detect the multi-mode guard phrase andmode-specific speech commands in the first and second interface modes482 and 484. Further, in method 450 a time-out process is added to thesecond interface mode 484 as an additional protection againstfalse-positive detections of speech commands.

Referring to FIG. 4B in greater detail, when an HMD begins operating inthe first interface mode 482, the HMD enables a hotword process for theguard phrase (if it is disabled at the time), and disables the hotwordprocess for the first-mode speech commands (if it is enabled at thetime), as shown by block 452. The hotword process for the guard phraseis then used to listen for the guard phrase, as shown by block 454. Ifthe guard phrase is detected, then the HMD enables the hotword processfor one or more first-mode speech commands, as shown by block 456. Thehotword process for the one or more first-mode speech commands is thenused to listen for these speech commands, as shown by block 458.

In the illustrated embodiment, the first-mode speech commands includeone speech command that launches a process and/or UI that corresponds tothe second interface mode 484. When this particular first-mode speechcommand is detected, the HMD transitions to the second interface mode484, as represented by the arrow from block 458 to block 460. Note thatwhile transitions to other interface modes are not shown in FIG. 4B, anHMD might also be configured to transition to other interface modes inresponse to detecting other first-mode speech commands.

When the HMD begins operating in the second interface mode 484, the HMDenables the hotword process for the guard phrase (if it not alreadyenabled at the time), and disables the hotword process for thesecond-mode speech command (if it is enabled at the time), as shown byblock 460. The hotword process for the guard phrase is again used tolisten for the guard phrase, as shown by block 462. If the guard phraseis detected, then the HMD enables the hotword process for thesecond-mode speech command, as shown by block 464. The hotword processfor the second-mode speech command is then used to listen for thisspeech command, as shown by block 464.

In a further aspect of the second interface mode 484, when the HMDdetects the guard phrase, the HMD may also implement a time-out processin an effort to further protect against false-positives. For example, ator near when the HMD detects the guard phrase, the HMD may start atimer. Accordingly, the HMD may then continue to listen for thesecond-mode speech command, at block 466, for the duration of the timer(which may also be referred to as the “timeout period”). If the HMDdetects the second-mode speech command before the timeout periodelapses, the HMD initiates a process corresponding to the second-modespeech command, as shown by block 470. However, if the second-modespeech command has not been detected, and the HMD determines at block468 that the timeout period has elapsed, then the HMD repeats block 460in order to enable the hotword process for the guard phrase (if it notalready enabled at the time), and disable the hotword process for thesecond-mode speech command.

Note that in an example embodiment, there may only be one hotwordprocess for speech commands available at a given point in time (i.e.,the hotword process for speech commands that are available in thecurrent interface). In such an embodiment, the hotword process for thefirst-mode speech commands will already be disabled when the HMDswitches to the first-interface mode from another interface mode. Thus,if the hotword process for the first-mode speech commands is alreadydisabled when the HMD carries out block 452, then the HMD may not needto take any action to disable the hotword process for the first-modespeech commands. Similarly, if the hotword process for the second-modespeech commands is already disabled when the HMD carries out block 460,then the HMD may not need to take any action to disable the hotwordprocess for the second-mode speech commands.

Further, in some embodiments, the HMD may enable the hotword process forthe guard phrase so long as some speech commands are available inwhichever interface mode the HMD is operating in. Accordingly, thehotword process for the guard phrase may remain enabled as the HMDswitches between interface modes where voice commands are provided. Forinstance, the hotword process for the guard phrase may be enabled in thefirst interface mode 482, and may remain when the HMD switches from thefirst interface mode 482 to the second interface mode 484. Thus, if thehotword process for the guard phrase is already enabled when the HMDcarries out block 452 and/or block 460, then the HMD may not need totake any action to enable the hotword process for the guard phrase.

In some embodiments, the hotword process for guard phrase may be keptenabled after guard phrase detected, at same time as hotword process forspeech commands is enabled (or alternatively hotword process for speechcommands may simply include guard phrase and speech commands). Furtherin some cases, the hotword process for the guard phrase may bepermanently loaded, such that the HMD is always listening for the guardphrase. Alternatively, the HMD may disable the hotword process for theguard phrase when the hotword process for speech commands is enabledand/or at other times when the HMD does not need to listen for the guardphrase (e.g., when the HMD is operating in an interface mode where nospeech commands are available and/or where available speech commands arealways enabled).

In a further aspect, an HMD may also provide visual cues for a voice UI.As such, when the hotword process for the guard phrase is enabled, suchas at block 452 or block 460, method 450 may further involve the HMDdisplaying a visual cue that is indicative of the guard phrase.Additionally or alternatively, after the guard phrase is detected in agiven interface mode, method 450 may further involve the HMD displayingone or more visual cues corresponding to the one or more speech commandsthat have been enabled. For example, at block 456, the HMD may displayvisual cues that correspond to the first-mode speech commands, and atblock 464, the HMD may display a visual cue that corresponds to thesecond-mode speech commands. Other examples are also possible.

Note that in method 450, the second interface mode 484 uses the guardphrase to protect a single voice command. In some ways, this may beregarded as counterintuitive, as it might be perceived as an extra stepthat could annoy a user. However, as will be described below in greaterdetail, a carefully chosen guard command may alleviate the perception ofa guard phrase as an extra step. More specifically, in some embodiments,the guard phrase may be selected such that a user may perceive the guardphrase and a subsequent speech command as a single command, even thoughthe HMD is detecting them separately.

IV. Illustrative Device Functionality

FIGS. 5A to 5C illustrate applications of a multi-mode guard phrase,according to example embodiments. In order to provide a voice UI with amulti-mode guard phrase, these applications may utilize methods such asthose described in reference to FIGS. 4A and 4B. However, othertechniques may also be used to provide the UI functionality shown inFIGS. 5A to 5C.

FIG. 5A shows an application that involves a home-screen interface 501and a voice-recording interface 503. More specifically, an HMD mayoperate in a home-screen mode 501, where certain speech commands can beenabled by saying the guard phrase “ok glass.” This may be implementedby loading a hotword process that listens only for the phrase “okglass.” While in the home-screen mode 501, the HMD may display “okglass” as a visual cue that the guard phrase can be used to enablespeech interaction, as shown by screen view 500.

When the HMD detects that the wearer has said “ok glass,” the HMD mayenable the speech commands for the home-screen mode 501. To do so, theHMD may load a hotword process that listens for the speech commands thatare useful in home-screen mode 501. Further, the HMD may display visualcues indicating the speech commands that have been enabled. For example,as shown by screen view 502, the HMD may display visual cuescorresponding to speech commands, such as “navigate to”, “take a photo”,“record a video”, “send message to”, etc.

After enabling the speech commands in home-screen mode 501, the HMD mayswitch to a different interface mode in response to one of the speechcommands. For instance, as shown in FIG. 5A, if the user says “record avideo,” the HMD may switch from home-screen mode 501 to avideo-recording mode 503. Further, in some embodiments, different speechcommands could be used to initiate different functions and/or launchother applications, which may each have their own correspondinginterface mode. As examples, a “send message to” command could switch toa text-message or e-mail interface mode, and a “make a call” commandcould switch to a phone-call interface mode. Other examples are alsopossible.

When the HMD switches to video-recording mode 503, the HMD may launch acamera application and automatically start recording video. Further, ahotword process for video-recording mode 503 may provide for a singlespeech command to stop the video recording, which is also guarded by the“ok glass” guard phrase. To indicate that a speech command can beenabled, the HMD may display a visual cue. For example, in screen view504, the HMD displays the guard phrase, “ok glass,” to indicate that theuser can enable voice interaction by saying “ok glass.” Other examplesare also possible.

When the HMD is in video-recording mode 503, and detects that the wearerhas said “ok glass,” the HMD may enable the hotword process that listensfor the single speech command that is available in the video-recordingmode 503; e.g., a “stop recording” speech command. Further, the HMD mayprovide a visual cue that the “stop recording” speech command can beused. For example, the HMD may update the display so that “stoprecording” follows “ok glass,” as shown by screen view 506. The wearercan then say “stop recording” to cause the camera application to ceaserecording video (as shown in screen view 508, where an indicator in thelower right may stop blinking to indicate that video is not beingcaptured).

As noted above, some interface modes may provide an additional guardagainst false-positive detection of speech commands by utilizing atimeout process to disable speech command(s) when no speech command isdetected within a certain period of time after detecting the guardphrase. Further, the implementation of a time-out process may varybetween modes.

For instance, consider the example in FIG. 5A with a home-screen modeand a video-recording mode. It might be the case that users typically goto the home screen with a specific task in mind, and thus are only therefor a short time (e.g., 15 seconds or less). However, it might also bethe case that users typically record video for a longer period of time(e.g., 1 to 10 minutes). If a user typically stays on the home screenfor less time than the user spends recording a video, the probability ofincorrectly concluding the user has said “ok glass” while at the homescreen may be less than the probability of such a false positive whilerecording video. Accordingly, a shorter time-out period may beimplemented in the video-recording mode (e.g., 5 seconds) and a longertime-out period (or possibly no time-out period) may be implemented inthe home-screen mode.

In yet a further aspect, the visual cues for the same guard phrase mayvary between interface modes. Additionally or alternatively, the visualcues for enabled speech commands may be formatted for display indifferent ways, depending upon the particular interface mode. Forinstance, a home screen may serve as a launch point to get to othermodes, and thus it may be acceptable for visual cues to be displayedsuch that they take up more space and/or are more central in the HMDwearer's field of view. During video recording, however, it may bedesirable for the visual cues to be less obtrusive, so that the wearercan see through the display to their environment. In particular, sincethe HMD may not have an actual viewfinder for a point-of-view (POV)camera, the wearer may assume that there field of view is roughly whatthe POV is capturing on video, which makes an obstructed view throughthe viewfinder desirable.

Thus, in home-screen mode 501, the “ok glass” visual cue may be largerand centrally placed, as shown in screen view 500. Further, the visualcues for speech commands shown in screen view 502 are displayed in amenu form that occupies a significant amount of the display. Invideo-recording mode 503, however, the visual cues for “ok glass” guardphrase and the “stop recording” speech command are smaller and displayedat the lower edge of the display, as shown in screen views 504 and 506.(Note that the locations may vary; for instance, the visual cues for the“ok glass” guard phrase and the “stop recording” could also be displayedat the upper edge of the display, or elsewhere.)

Note that if the user thinks of “glass” as the name or type of theirdevice, then the phrase “ok glass” may be a particularly good choice fora guard phrase. In particular, the phrase “ok glass” may feel likesomething the user would say in order to address the device, in the samemanner as they might address a person to whom they are speaking. Assuch, while the computing device may treat the guard phrase and asubsequent speech commands as separate voice commands, the combinationof the guard phrase and a speech command may feel to the user like asingle command in which they address their computing device and tell thedevice what they want it to do. For instance, in the home-screen mode501, the user may say “ok glass, record a video,” which may feel to theuser like a single voice command, even though the hotword process todetect “record a video” is not loaded until the user says “ok glass.”Thus, with the use of a guard phrase such as “ok glass,” the user maynot even be aware that the guard phrase is required, or at least mayfind the need to say a guard phrase less cumbersome.

Other guard phrases may enhance the user experience in a similar way as“ok glass” can. Generally, any guard phrase that a user may perceive asaddressing or conversing with their device could similarly enhance theuser experience. For example, a guard phrase that includes a name forthe device, which may be predefined or created by the user, may have asimilar effect on the user experience. Other examples are also possible.

In a further aspect, an example computing device may be configured toremove a guard phrase, such as “ok glass”, and/or to remove a “stoprecording” speech command, from the audio portion of a video. Morespecifically, when a user says “ok glass, stop recording,” or somethingto that effect, in order to stop recording a video, the audio portion ofthe video may include the speech command. However, in many instances,the user may not want the speech command to be included in the audio.Accordingly, an HMD may be configured to at least partially remove theguard phrase and/or the stop-recording command from the audio portion ofa captured video.

The HMD may use various techniques in an effort to remove the guardphrase and/or the stop-recording command from the audio portion of acaptured video. For example, the HMD could simply trim the portion ofthe video that includes the guard phrase and/or the stop-recordingcommand. The HMD could also trim just the audio, without trimming theentire video (e.g., so that the video is silent during the time when theuser says the guard phrase and/or the stop-recording command).

As another example, the HMD could use a technique such as spectralsubtraction to remove the guard phrase and/or the stop-recording commandfrom the video, without trimming the audio. For example, the HMD maystore a previous instance or instances when the user says the guardphrase and/or the stop-recording command, and create a model speechsignals for the guard phrase and/or the stop-recording command. The HMDmay then subtract the model speech signals from the audio portion of avideo to at least partially remove the guard phrase and/or thestop-recording command from the audio (while hopefully leaving asignificant portion of other audio intact). Other techniques are alsopossible.

Many other interface modes with guard-phrase protected speech commandsare possible. For example, FIG. 5B shows an application that involves anincoming-call mode 511 and another interface mode 513. In some cases,the other interface mode 513 may be the home-screen mode 501 describedin reference to FIG. 5A; but in other cases, it could be a differentinterface mode. In any case, when the HMD is operating in the otherinterface mode 513 and receives an incoming call, the HMD mayresponsively switch to incoming-call mode 511.

There may be one or more speech commands available in the incoming-callmode 511, which can be enabled via the “ok glass” guard phrase. Toindicate that a speech command can be enabled, the HMD may displayvisual cue. Thus, in the illustrated example, the HMD displays the guardphrase “ok glass,” as shown by screen view 510. When the HMD detectsspeech that includes “ok glass,” the HMD may display visual cues toindicate the particular speech commands that have been enabled. Forexample, in the illustrated embodiment, the HMD display visual cuesindicating that the speech commands “answer call,” “send to voicemail,”and “send busy message,” can be utilized, as shown by screen view 512.

FIG. 5C provides another example of an interface mode with guard-phraseprotected speech commands. In particular, FIG. 5C shows an applicationthat involves an incoming-message mode 521 and another interface mode523. In some cases, the other interface mode 523 may be the home-screenmode 501 described in reference to FIG. 5A; but in other cases, it couldbe a different interface mode. In any case, when the HMD is operating inthe other interface mode 523 and receives an incoming message, such as atext message or an e-mail, the HMD may responsively switch toincoming-message mode 521.

There may be one or more speech commands available in theincoming-message mode 521, which can be enabled via the “ok glass” guardphrase. To indicate that the one or more speech commands can be enabled,the HMD may display one or more visual cues. Thus, in the illustratedexample, the HMD displays the guard phrase “ok glass,” as shown byscreen view 520. Then, when the HMD detects speech that includes “okglass,” the HMD may display visual cues to indicate the particularspeech commands that have been enabled. In the illustrated embodiment,the HMD display visual cues indicating that the speech commands “showthe message,” “reply to the message,” and “delete the message,” can beutilized, as shown by screen view 522.

V. Conclusion

While various aspects and embodiments have been disclosed herein, otheraspects and embodiments will be apparent to those skilled in the art.The various aspects and embodiments disclosed herein are for purposes ofillustration and are not intended to be limiting, with the true scopeand spirit being indicated by the following claims.

We claim:
 1. A method implemented by one or more processors of acomputing device, the method comprising: transitioning the computingdevice into a given interface mode in response to receiving sensor datacaptured by at least one sensor of the computing device; in response tothe computing device being in the given interface mode: activating oneor more speech commands that are specific to the given interface mode,wherein the given interface mode corresponds to a current state of auser interface of the computing device, and wherein the given interfacemode is one of multiple interface modes each corresponding tocorresponding alternate states of the user interface; while thecomputing device is in the given interface mode and while the one ormore speech commands are activated: receiving audio data captured by atleast one audio sensor of the computing device; analyzing, based on theone or more speech commands being activated, the audio data to determinewhether any of the one or more speech commands are included in the audiodata; in response to determining a given speech command, of the one ormore speech commands, is included in the audio data without a predefinedguard phrase, and in response to determining that the audio data isreceived within a predetermined period of time of transitioning thecomputing device into the given interface mode: performing one or moreactions, via the computing device, that correspond to the given speechcommand.
 2. The method of claim 1, further comprising: while thecomputing device is in the given interface mode and while the one ormore speech commands are activated: determining the predetermined periodof time has lapsed; and in response to determining the predeterminedperiod of time has lapsed, deactivating one or more of the speechcommands that are specific to the given interface mode.
 3. The method ofclaim 2, further comprising: while the computing device is in the giveninterface mode and while the one or more speech commands aredeactivated: in response to determining the given speech command, of theone or more speech commands, is included in the audio data without thepredefined guard phrase, and in response to determining that the audiodata is not received within the predetermined period of time: refrainingfrom performing one or more of the actions, via the computing device,that correspond to the given speech command.
 4. The method of claim 3,further comprising: subsequent to refraining from performing one or moreof the actions, via the computing device, that correspond to the givenspeech command: receiving additional audio data captured by the at leastone audio sensor of the computing device, the additional audio dataincluding at least the predefined guard phrase; and re-activating one ormore speech of the commands that are specific to the given interfacemode.
 5. The method of claim 1, further comprising: while the computingdevice is in the given interface mode and while the one or more speechcommands are activated: displaying a visual cue for one or more of thespeech commands via a display of the computing device.
 6. The method ofclaim 5, further comprising: while the computing device is in the giveninterface mode and while the one or more speech commands are activated:determining the predetermined period of time has lapsed; and in responseto determining the predetermined period of time has lapsed, causing thevisual cue for the one or more speech commands to be removed from thedisplay of the computing device.
 7. The method of claim 1, whereinactivating the one or more speech commands that are specific to thegiven interface mode comprises loading, at the computing device, atleast one hotword process for the one or more speech commands; andwherein analyzing, based on the one or more speech commands beingactivated, the audio data to determine whether any of the one or morespeech commands are included in the audio data comprises: analyzing theaudio data using the at least one hotword process.
 8. The method ofclaim 1, wherein the sensor data captured by the at least one sensor ofthe computing device comprises preceding audio data captured by the atleast one audio sensor of the computing device, the preceding audio dataincluding at least the predefined guard phrase and an indication of thegiven interface mode.
 9. The method of claim 1, wherein thepredetermined period of time is specific to the given interface mode.10. The method of claim 1, wherein activating one or more of the speechcommands in based on the first audio data.
 11. The method of claim 1,wherein the computing device is a head-mountable device.
 12. A computingdevice including memory and one or more processors configured to executeinstructions stored in memory, the instructions comprising instructionsto: transition the computing device into a given interface mode inresponse to receiving sensor data captured by at least one sensor of thecomputing device; in response to the computing device being in the giveninterface mode: activate one or more speech commands that are specificto the given interface mode, wherein the given interface modecorresponds to a current state of a user interface of the computingdevice, and wherein the given interface mode is one of multipleinterface modes each corresponding to corresponding alternate states ofthe user interface; while the computing device is in the given interfacemode and while the one or more speech commands are activated: receiveaudio data captured by at least one audio sensor of the computingdevice; analyze, based on the one or more speech commands beingactivated, the audio data to determine whether any of the one or morespeech commands are included in the audio data; in response todetermining a given speech command, of the one or more speech commands,is included in the audio data without a predefined guard phrase, and inresponse to determining that the audio data is received within apredetermined period of time of transitioning the computing device intothe given interface mode: perform one or more actions, via the computingdevice, that correspond to the given speech command.
 13. The computingdevice of claim 12, wherein the instructions further compriseinstructions to: while the computing device is in the given interfacemode and while the one or more speech commands are activated: determinethe predetermined period of time has lapsed; and in response todetermining the predetermined period of time has lapsed, deactivate oneor more of the speech commands that are specific to the given interfacemode.
 14. The computing device of claim 13, wherein the instructionsfurther comprise instructions to: while the computing device is in thegiven interface mode and while the one or more speech commands aredeactivated: in response to determining the given speech command, of theone or more speech commands, is included in the audio data without thepredefined guard phrase, and in response to determining that the audiodata is not received within the predetermined period of time: refrainfrom performing one or more of the actions, via the computing device,that correspond to the given speech command.
 15. The computing device ofclaim 14, wherein the instructions further comprise instructions to:subsequent to refraining from performing one or more of the actions, viathe computing device, that correspond to the given speech command:receive additional audio data captured by the at least one audio sensorof the computing device, the additional audio data including at leastthe predefined guard phrase; and re-activate one or more speech of thecommands that are specific to the given interface mode.
 16. Thecomputing device of claim 12, wherein the instructions further compriseinstructions to: while the computing device is in the given interfacemode and while the one or more speech commands are activated: display avisual cue for one or more of the speech commands via a display of thecomputing device.
 17. The computing device of claim 16, wherein theinstructions further comprise instructions to: while the computingdevice is in the given interface mode and while the one or more speechcommands are activated: determine the predetermined period of time haslapsed; and in response to determining the predetermined period of timehas lapsed, cause the visual cue for the one or more speech commands tobe removed from the display of the computing device.
 18. The computingdevice of claim 12, wherein the instructions to activate the one or morespeech commands that are specific to the given interface mode compriseinstructions to load, at the computing device, at least one hotwordprocess for the one or more speech commands; and wherein theinstructions to analyze, based on the one or more speech commands beingactivated, the audio data to determine whether any of the one or morespeech commands are included in the second audio data compriseinstructions to: analyze the audio data using the at least one hotwordprocess.
 19. The computing device of claim 12, wherein the sensor datacaptured by the at least one sensor of the computing device comprisespreceding audio data captured by the at least one audio sensor of thecomputing device, the preceding audio data including at least thepredefined guard phrase and an indication of the given interface mode.20. A non-transitory computer-readable storage medium storinginstructions executable by at least one processor, the instructionsincluding instructions to: transition the computing device into a giveninterface mode in response to receiving sensor data captured by at leastone sensor of the computing device; in response to the computing devicebeing in the given interface mode: activate one or more speech commandsthat are specific to the given interface mode, wherein the giveninterface mode corresponds to a current state of a user interface of thecomputing device, and wherein the given interface mode is one ofmultiple interface modes each corresponding to corresponding alternatestates of the user interface; while the computing device is in the giveninterface mode and while the one or more speech commands are activated:receive audio data captured by at least one audio sensor of thecomputing device; analyze, based on the one or more speech commandsbeing activated, the audio data to determine whether any of the one ormore speech commands are included in the audio data; in response todetermining a given speech command, of the one or more speech commands,is included in the audio data without a predefined guard phrase, and inresponse to determining that the audio data is received within apredetermined period of time of transitioning the computing device intothe given interface mode: perform one or more actions, via the computingdevice, that correspond to the given speech command.